Search CORE

73 research outputs found

Grammar and processing of order and dependency: a categorial approach

Author: Hepple Mark
Publication venue: The University of Edinburgh
Publication date
Field of study

Compacting the Penn Treebank Grammar

Author: Gaizauskas Robert
Hepple Mark
Krotov Alexander
Wilks Yorick
Publication venue
Publication date: 01/01/1998
Field of study

Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammar: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more treebanked text would be required to obtain a complete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored by applying an algorithm to compact the derived grammar by eliminating redundant rules -- rules whose right hand sides can be parsed by other rules. The size of the resulting compacted grammar, which is significantly less than that of the full treebank grammar, is shown to approach a limit. However, such a compacted grammar does not yield very good performance figures. A version of the compaction algorithm taking rule probabilities into account is proposed, which is argued to be more linguistically motivated. Combined with simple thresholding, this method can be used to give a 58% reduction in grammar size without significant change in parsing performance, and can produce a 69% reduction with some gain in recall, but a loss in precision.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Experiments in Structure-Preserving Grammar Compaction

Author: Hepple Mark
van Genabith Josef
Publication venue
Publication date: 01/01/2000
Field of study

Structure preserving grammar compaction (SPC) is a simple CFG compaction technique originally described in (van Genabith et al., 1999a, 1999b). It works by generalising category labels and in so doing plugs holes in the grammar. To date the method has been tested on small corpra only. In the present research we apply SPC to a large grammar extracted from the Penn Treebank and examine its effects on rule treebank grammar size and on rule accession rates (as an indicator of grammar completeness) . 1 Introduction Tree banks and resources compiled from treebanks are potentially very useful in NLP. Grammars extracted from treebanks --- so called treebank grammars (Charniak, 1996) --- can form the basis of large coverage NLP systems. Such treebank grammars, however, can suffer from several shortcomings: they commonly feature a large number of flat, highly specific rules that may be rarely used, with ensuing costs for processing (load) under the grammar

CiteSeerX

Irish Universities

DCU Online Research Access Service

Igbo-English Machine Translation:An Evaluation Benchmark

Author: Chinedu Uchechukwu
Ezeani Ignatius
Hepple Mark
Onyenwe Ikechukwu E.
Rayson Paul
Publication venue
Publication date: 01/04/2020
Field of study

Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no data, tools, and techniques for NLP research. For instance, only 5 out of 2965, 0.19% authors of full text papers in the ACL Anthology extracted from the 5 major conferences in 2018 ACL, NAACL, EMNLP, COLING and CoNLL, are affiliated to African institutions. In this work, we discuss our effort toward building a standard machine translation benchmark dataset for Igbo, one of the 3 major Nigerian languages. Igbo is spoken by more than 50 million people globally with over 50% of the speakers are in southeastern Nigeria. Igbo is low resourced although there have been some efforts toward developing IgboNLP such as part of speech tagging and diacritic restoratio

Lancaster E-Prints

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

D-Tree substitution grammars

Author: Candito Marie-HlÁene
David Weir
Harley Heidi
Hepple Mark
K. Vijay-Shanker
Owen Rambow
Rambow Owen
Schabes Yves
Vijay-Shanker K.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2001
Field of study

There is considerable interest among computational linguists in lexicalized grammatical frame-works; lexicalized tree adjoining grammar (LTAG) is one widely studied example. In this paper, we investigate how derivations in LTAG can be viewed not as manipulations of trees but as manipulations of tree descriptions. Changing the way the lexicalized formalism is viewed raises questions as to the desirability of certain aspects of the formalism. We present a new formalism, d-tree substitution grammar (DSG). Derivations in DSG involve the composition of d-trees, special kinds of tree descriptions. Trees are read off from derived d-trees. We show how the DSG formalism, which is designed to inherit many of the characterestics of LTAG, can be used to express a variety of linguistic analyses not available in LTAG

CiteSeerX

Crossref

Sussex Research Online

A Web Service for Biomedical Term Look-Up

Author: Biemann
Chen
Cunningham
Curran
Dalli
Ferris
Gaizauskas
Gaizauskas
Gaizauskas
Hahn
Harkema
Harkema
Henk Harkema
Hirschman
Humphreys
Ian Roberts
Mark Hepple
Quasthoff
Rob Gaizauskas
Swanson
The Gene Ontology Consortium
Wain
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2005
Field of study

Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web

Crossref

Directory of Open Access Journals

PubMed Central

White Rose Research Online

Mining clinical relationships from patient narratives

Author: A Rector
A Roberts
A Roberts
A Roberts
Angus Roberts
C Blaschke
C Friedman
C Giuliano
C Grover
C Nédellec
CB Ahlers
D Klein
D Lindberg
D Zelenko
Defense Advanced Research Projects Agency
G Doddington
G Zhou
H Cunningham
H Harkema
J Pustejovsky
J Thomas
K Fundel
M Goadrich
Mark Hepple
N Chinchor
N Sager
P Zweigenbaum
R Bunescu
R Gaizauskas
RC Bunescu
Robert Gaizauskas
S Katrenko
S Miller
S Pakhomov
T Rindflesch
T Wang
TC Rindflesch
U Hahn
W Chapman
Y Li
Y Lussier
Yikun Guo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships. Results We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships. Conclusion We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types

Crossref

Springer - Publisher Connector

PubMed Central

White Rose Research Online

Aberrant Mitochondrial Homeostasis in the Skeletal Muscle of Sedentary Older Adults

Author: A Barrientos
A Safdar
Adeel Safdar
AM Abbatecola
AM Lezza
B Chakravarti
BD Roy
BS Tseng
BW Penninx
C Franceschi
C Handschin
C Handschin
C Leeuwenburgh
C Ling
C Zucchini
CP Fischer
D Boffoli
D Chretien
D Harman
DA Mahler
DC Wright
DR Berk
E Bua
E Roddy
E Topinkova
EA Bua
EJ Lesnefsky
F Cardellach
G Parise
G Parise
G Parise
GC Kujoth
H Klitgaard
H Pilegaard
HB Hubert
HC Lee
I Trounce
J Aiken
J St-Pierre
J Wanagat
Jan J. Kaczor
JM Bauer
JR Ruiz
Justin deBeer
JW Bijlsma
KG Manton
KM Humphries
L Fakhrzadeh
L Ferrucci
LA Loeb
LJ Melton 3rd
LL Ji
LV Thompson
M Capasso
M Cesari
M Cesari
M Fransen
M Higuchi
M Pahor
M Sandri
M Tarnopolsky
MA Albert
MA Fiatarone
MA Rogers
MA Tarnopolsky
Mark A. Tarnopolsky
Mazen J. Hamadeh
ME Patti
ML Hamilton
NE Thomas
NJ Bosomworth
NK Chokshi
OE Rooyackers
R Song
R Wilkie
RE Hubbard
RF Loeser
RF Loeser
RF Loeser
RH Hsieh
RM Anderson
RS Sohal
RT Hepple
S Melov
S Melov
Sandeep Raha
SE Gabriel
SG Leveille
Sudha Agarwal
SW Trappe
SX Leng
TD Spector
TK Ali
TW Buford
UF Rasmussen
V Demicheli
W Hollmann
WJ Strawbridge
WR Frontera
WR Frontera
WR Frontera
WR Frontera
X Shi
Z Cao
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The role of mitochondrial dysfunction and oxidative stress has been extensively characterized in the aetiology of sarcopenia (aging-associated loss of muscle mass) and muscle wasting as a result of muscle disuse. What remains less clear is whether the decline in skeletal muscle mitochondrial oxidative capacity is purely a function of the aging process or if the sedentary lifestyle of older adult subjects has confounded previous reports. The objective of the present study was to investigate if a recreationally active lifestyle in older adults can conserve skeletal muscle strength and functionality, chronic systemic inflammation, mitochondrial biogenesis and oxidative capacity, and cellular antioxidant capacity. To that end, muscle biopsies were taken from the vastus lateralis of young and age-matched recreationally active older and sedentary older men and women (N = 10/group; ♀ = ♂). We show that a physically active lifestyle is associated with the partial compensatory preservation of mitochondrial biogenesis, and cellular oxidative and antioxidant capacity in skeletal muscle of older adults. Conversely a sedentary lifestyle, associated with osteoarthritis-mediated physical inactivity, is associated with reduced mitochondrial function, dysregulation of cellular redox status and chronic systemic inflammation that renders the skeletal muscle intracellular environment prone to reactive oxygen species-mediated toxicity. We propose that an active lifestyle is an important determinant of quality of life and molecular progression of aging in skeletal muscle of the elderly, and is a viable therapy for attenuating and/or reversing skeletal muscle strength declines and mitochondrial abnormalities associated with aging

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central